Designing HMMs: Motif discovery and modeling
نویسنده
چکیده
Position Specific Scoring Matrices capture the distribution of residues observed in each position in a conserved motif, but are not a good model for variable length motifs, recognition of new instances with insertions and deletions, and positional dependencies. Moreover, PSSMs can be used to search for instances of an ungapped motif in an unlabeled sequence, but do not lend themselves to precise boundary detection. We turned to Hidden Markov models to address these limitations. HMMs provide a flexible and expressive formalism for modeling conserved sequence motifs. In addition to modeling precise conserved motifs, like the WEIRD motif, HMMs can also be used to model biologically distinct regions that are characterized by a change in underlying sequence composition, rather than a precise pattern. Examples of these include transmembrane regions, which are enriched for hydrophobic residues, and CpG islands, which have higher GC content.
منابع مشابه
Meta-MEME: motif-based hidden Markov models of protein families
MOTIVATION Modeling families of related biological sequences using Hidden Markov models (HMMs), although increasingly widespread, faces at least one major problem: because of the complexity of these mathematical models, they require a relatively large training set in order to accurately recognize a given family. For families in which there are few known sequences, a standard linear HMM contains...
متن کاملThe Effects of Ordered-Series-of-Motifs Anchoring and Sub-Class Modeling on the Generation of HMMs Representing Highly Divergent Protein Sequences
Hidden Markov Models (HMMs) provide a flexible method for representing protein sequence data. Highly divergent data require a more complex approach to HMM generation than previously demonstrated. We describe a strategy of motif anchoring and sub-class modeling that aids in the construction of more informative HMMs as determined by a new algorithm called a stability measure.
متن کاملDevelopment of an Efficient Hybrid Method for Motif Discovery in DNA Sequences
This work presents a hybrid method for motif discovery in DNA sequences. The proposed method called SPSO-Lk, borrows the concept of Chebyshev polynomials and uses the stochastic local search to improve the performance of the basic PSO algorithm as a motif finder. The Chebyshev polynomial concept encourages us to use a linear combination of previously discovered velocities beyond that proposed b...
متن کاملIs Lead Investigator on One National Peer-reviewed Grant: Yes Grant Information: Nih R01 Eb007057 Machine Learning Analysis of Tandem Mass Spectra 3/1/07--2/28/11
Motivation: Modeling families of related biological sequencesusing Hidden Markov models (HMMs), although increasinglywidespread, faces at least one major problem: because of thecomplexity of these mathematical models, they require arelatively large training set in order to accurately recognize agiven family. For families in which there are few knownsequences, a standard ...
متن کاملHH-MOTiF: de novo detection of short linear motifs in proteins by Hidden Markov Model comparisons
Short linear motifs (SLiMs) in proteins are self-sufficient functional sequences that specify interaction sites for other molecules and thus mediate a multitude of functions. Computational, as well as experimental biological research would significantly benefit, if SLiMs in proteins could be correctly predicted de novo with high sensitivity. However, de novo SLiM prediction is a difficult compu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015